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ABSTRACT : 



An embodiment of the invention includes a method of forming a control-dataflow graph 
that includes separating a control flow graph into two or more basic blocks, and 
converting said two or more basic blocks into code blocks, where the code blocks are 
formed into the control-dataflow graph. Another embodiment of the invention includes a 
method of forming a control-dataflow graph that includes separating a control flow graph 
into two or more basic blocks, forming a lode node in at least one of said basic blocks, 
forming a store node in at least one of said code blocks, inserting a delay node in at 
least one of said code blocks, segregating external hardware logic modules from said 
control flow graph, and converting said two or more basic blocks into code blocks, 
wherein the code blocks are formed into the control-dataflow graph. 
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ABSTRACT : 



A system and method for compiling computer code written to conform to a high-level 
language standard to generate a unified executable containing the hardware logic for a 
reconf igurable processor, the instructions for a traditional processor (instruction 
processor) , and the associated support code for managing execution on a hybrid hardware 
platform. Explicit knowledge of writing hardware-level design code is not required since 
the problem can be represented in a high-level language syntax. A top-level driver 
invokes a standard-conforming compiler that provides syntactic and semantic analysis. 
The driver invokes a compilation phase that translates the CFG representation being 
generated into a hybrid controlf low-dataflow graph representation representing optimized 
pipelined logic which may be processed into a hardware description representation. The 
driver invokes a hardware description language (HDL) compiler to produce a netlist file 
that can be used to start the place-and-route compilation needed to produce a bitstream 
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for the reconf igurable computer. The programming environment then provides support for 
taking the output from the compilation driver and combining all the necessary components 
together to produce a unified executable capable of running on both the instruction 
processor and reconf igurable processor. 
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ABSTRACT : 

In a dynamically compiling computer system^, a system and method for efficiently 
transferring control from execution of an instruction in a first representation to a 
second representation of the instruction is disclosed. The system and method include the 
setting of a tag for entry points of each instruction in a first representation that has 
been translated to a second representation. The tag is stored in memory in association 
with each such instruction. When a given instruction in a first representation is to be 
executed, the tag is examined, and if it indicates that a translated version of the 
instruction has previously been generated, control is passed to execution of the 
instruction in the second representation. The second representation can be a different 
instruction set representation, or an optimized representation in the same instruction 
set as the original instruction. 
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ABSTRACT : 



An interpretation flow, a translation and optimization flow, and an original instruction 
prefetch flow are defined independently of one another. A processor is realized as a 
chip multiprocessor or realized so that one instruction execution control unit can 
process a plurality of processing flows simultaneously. The plurality of processing 
flows is processed in parallel with one another. Furthermore, within the translation and 
optimization flow, translated instructions are arranged to define a plurality of 
processing flows. Within the interpretation flow, when each instruction is interpreted, 
if a translated instruction corresponding to the instruction processed within the 
translation and optimization flow is present, the translated instruction is executed. 
According to the present invention, an overhead including translation and optimization 
that are performed in order to execute instructions oriented to an incompatible 
processor is minimized. At the same time, translated instructions are processed quickly, 
and a processor is operated at a high speed with low power consumption. Furthermore, an 
overhead of original instruction fetching is reduced. 
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ABSTRACT : 



Systems and methods for verifying execution of translated code operative on a host 
computer system different from the computer system designated for the original program 
code. In one arrangement, the system and method fetch program code, translate program 
code, emit the translated program code into at least one code cache, execute the 
translated code within the at least one code cache, interpret the program code, and 
compare a translator generated state with an interpreter generated state to confirm 
desired code execution. 
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ABSTRACT : 



The present disclosure relates to a system and method for emulating a computer system. 
In one arrangement, the system and method pertain to fetching program code, translating 
program code, emitting translated program code into at least one code cache, and 
executing translated code within the at least one code cache in lieu of associated 
program code when a semantic function of the associated program code is requested. 
Operation of the system and method can be facilitated with an application programming 
interface that, in one arrangement, can comprise a set of functions available to the 
translator including an emit fragment function with which the translator can emit code 
fragments into code caches of the dynamic execution layer interface, and an execute 
function with which the translator can request execution of code fragments contained 
within the at least one code cache. 



http://westbrs:9000/bin/gate.exe?f=TOC&state=566b8u.17&ref=13&dbname=PGPB,USP... 6/12/04 



las? (fixation 



Reference Sequences Att:achments Claims 



□ 8. Document ID: US 20020199179 A1 

L13: Entry 8 of 15 File: PGPB 



Dec, 26, 2002 



PGPUB-DOCUMENT-NUMBER: 2 002 019917 9 
PGPUB-FILING-TYPE: new 

DOCUMENT-IDENTIFIER: US 20020199179 Al 

TITLE: Method and apparatus for compiler-generated triggering of auxiliary codes 
PUBLICATION-DATE: December 26, 2002 
I NVENTOR- I NFORMAT I ON : 

RULE- 47 



NAME 


CITY 


STATE 


COUNTRY 


La very, Daniel M. 


Santa Clara 


CA 


US 


Wang, Hong 


Fremont 


CA 


US 


Hoflehner, Gerolf F. 


Santa Clara 


CA 


US 


Liao, Shih-wei 


Sunnyvale 


CA 


US 


Shen, John 


San Jose 


CA 


US 


Grochowski, Edward T. 


San Jose 


CA 


US 


Sehr, David C. 


Sunnyvale 


CA 


us 


Fang, Jesse Z. 


San Jose 


CA 


us 



US-CL-CURRENT: 717/158 
ABSTRACT : 

A method for executing a code is provided. The method includes receiving a trigger 
instruction, selecting an entry in a trigger table, the entry associated with the 
trigger instruction, and executing an auxiliary code referenced by the entry in the 
trigger table. 
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ABSTRACT: 

An apparatus comprising a circuit configured to (i) translate one or more instruction 
codes of a first instruction set into a sequence of instruction codes of a second 
instruction set and (ii) present the sequence of instruction codes of the second 
instruction set in response to a predetermined number of addresses. 

22 Claims, 12 Drawing figures 
Exemplary Claim Number: 1 
Number of Drawing Sheets: 7 
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7\BSTRACT : 

^ optimizing object code translation system and method perform dynamic compilation and 
translation of a target object code on a source operating system while performing 
optimization . Compilation and optimization of the target code is dynamically executed in 
real time. A compiler performs analysis and optimizations that improve emulation 
relative to template-based translation and interpretation such that a host processor 
which processes larger order instructions, such as 32-bit instructions, may emulate a 
target processor which processes smaller order instructions, such as 16-bit and 8-bit 
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instructions. The optimizing object code translator does not require knowledge of a 
static program flow graph or memory locations of target instructions prior to run time. 
In addition, the optimizing object code translator does not require knowledge of the 
location of all join points into the target object code prior to execution. During 
program execution, a translator records branch operations. The logging of information 
identifies instructions and instruction join points. When a number of times a branch 
operation is executed exceeds a threshold, the destination of the branch becomes a seed 
for compilation and code portions between seeds are defined as segments. A segment may 
be incomplete allowing for modification or replacement to account for a new flow of 
program control during real time program execution. 

32 Claims, 37 Drawing figures 
Exemplary Claim Number: 1 
Number of Drawing Sheets: 19 



Classrfication 



□ 11. Document ID: US 6397379 B1 

L13: Entry 11 of 15 



File: USPT 



May 28, 2002 



US-PAT-NO: 6397379 

DOCUMENT-IDENTIFIER: US 6397379 Bl 

TITLE: Recording in a program execution profile references to a memory-mapped active 
device 

DATE-ISSUED: May 28, 2002 



INVENTOR- INFORMATI ON : 
NAME 

Yates, Jr.; John S. 
Reese; David L. 
Van Dyke; Korbin S. 



CITY 
Needham 
Westborough 
Sunol 



STATE 
MA 
MA 
CA 



ZIP CODE 



COUNTRY 



US-CL-CURRENT: 717/140 



ABSTRACT : 



A method and a computer for execution of the method. As part of executing a stream of 
instructions, a series of memory loads is issued from a computer CPU to a bus, some 
directed to well-behaved memory and some directed to non-well-behaved devices in l/O 
space. Computer addresses are stored of instructions of the stream that issued memory 
loads to the non-well-behaved memory, the storage form of the recording allowing 
determination of whether the memory load was to well-behaved memory or not-well-behaved 
memory without resolution of any memory address stored in the recording. 



47 Claims, 5 Drawing figures 
Exemplary Claim Number: 1 
Number of Drawing Sheets: 41 
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ABSTRACT : 



The invention involves new microarchitecture apparatus and methods for superscalar 
microprocessors that support multi-instruction issue, decoupled dataflow scheduling, 
out-of-order execution, register renaming, multi-level speculative execution, and 
precise interrupts. These are the Distributed Instruction Queue (DIQ) and the Modified 
Reorder Buffer (MRB) . The DIQ is a new distributed instruction shelving technique that 
is an alternative to the reservation station (RS) technique and offers a more efficient 
(improved performance/cost) implementation. The Modified Reorder Buffer (MRB) is an 
improved reorder buffer (RB) result shelving technique eliminates the slow and expensive 
prioritized associative lookup, shared global buses, and dummy branch entries (to reduce 
entry usage) . The MRB has an associateive key unit which uses a unique associative key. 

14 Claims, 40 Drawing figures 
Exemplary Claim Number: 1 
Number of Drawing Sheets: 40 
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ABSTRACT : 



The present invention is a system, method, and product for improving the speed of 
dynamic translation systems by efficiently positioning translated instructions in a 
computer memory unit. More specifically, the speed of execution of translated 
instructions, which is a factor of particular relevance to dynamic optimization systems, 
may be adversely affected by inefficient jumping between traces of translated 
instructions. The present invention efficiently positions the traces with respect to 
each other and with respect to "trampoline" instructions that redirect control flow from 
the traces. For example, trampoline instructions may redirect control flow to an 
instruction emulator if the target instruction has not been translated, or to the 
translation of a target instruction that has been translated. When a target instruction 
has been translated, a backpatcher of the invention may directly backpatch the jump to 
the target so that the trampoline instructions are no longer needed. A method of the 
present invention includes: (1) designating "chunks" of memory locations, and (2) 
positioning a translated trace and its corresponding trampoline instructions in the same 
chunk. The size of the chunk generally is based on a "machine-specific shortest jump 
distance" that is the shortest maximum distance that a jump instruction may specify. In 
particular, the chunk length may be determined so that, for every translated trace and 
trampoline instruction positioned in the same chunk, the greatest distance between a 
translated jump instruction and its target trampoline instruction is not greater than 
the machine-specific shortest jump distance for that type of jump instruction. 

72 Claims, 6 Drawing figures 
Exemplary Claim Number: 1 
Number of Drawing Sheets: 6 
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ABSTRACT : 



A computer-implemented system, method, and product are provided to designate and 
translate traces of original instructions of an executable file at run time based on 
dynamic evaluation of control flow through frequently executed traces of instructions. 
Such designation typically reduces unnecessary translations and optimizations, and 
thereby increases execution speed and reduces the usage of memory and other resources. 
The invention includes a hot trace identifier to identify frequently executed traces of 
instructions and a hot trace instrumenter to instrument such frequently executed traces 
so that control flow through them may be recorded. If the amount or rate of control flow 
through a frequently executed trace exceeds a threshold value, a hot trace selector is 
invoked to select a hot trace of original instructions including those of the frequently 
executed trace. The hot trace may be dynamically optimized . The system, method, and 
product also provide for the continuous recording of control flow through hot traces. If 
control flow, has changed during execution, such that the amount or rate of control flow 
through a hot trace falls below a threshold value, the trace may be removed. 

69 Claims, 15 Drawing figures 
Exemplary Claim Number: 1 
Number of Drawing Sheets: 13 
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ABSTRACT : 



A distributed instruction queue (DIQ) in a superscalar microprocessor supports multi- 
instruction issue, decoupled data flow scheduling, out-of-order execution, register 
renaming, multi-level speculative execution, and precise interrupts. The DIQ provides 
distributed instruction shelving without storing register values, operand value copying, 
and result value forwarding, and supports in-order issue as well as out-of-order issue 
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within its functional unit. The DIQ allows a reduction in the number of global wires and 
replacement with private-local wires in the processor. The DIQ*s number of global wires 
remains the same as the number of DIQ entries and data size increases. The DIQ maintains 
maximum machine parallelism and the actual performance of the microprocessor using the 
DIQ is better due to reduced cycle time or more operations executed per cycle. 

22 Claims, 40 Drawing figures 
Exemplary Claim Number: 1 
Number of Drawing Sheets: 40 
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^ Dynannic native optimization of interpreters 

Gregory T. Sullivan, Derek L. Bruening, Iris Baron, Timothy Garnett, Saman Amarasinghe 
June 2003 Proceedings of the 2003 workshop on Interpreters, Virtual Machines and 
Emulators 

Full text available; pdf(150.25 KB) Additional Information: full citation , abstract , references , index terms 

For domain specific languages, "scripting languages", dynamic languages, and for virtual 
machine-based languages, the most straightforward implementation strategy is to write an 
interpreter. A simple Interpreter consists of a loop that fetches the next bytecode, 
dispatches to the routine handling that bytecode, then loops. There are many ways to 
improve upon this simple mechanism, but as long as the execution of the program is driven 
by a representation of the program other than as a stream of n ... 

2 Dynannic translation: The Transmeta Code Morphing™ Software: using speculation, 
recovery, and adaptive retranslation to address real-life challenges 
James C. Dehnert, Brian K. Grant, John P. Banning, Richard Johnson, Thomas Kistler, 
Alexander Klaiber, Jim Mattson 

March 2003 Proceedings of the international symposium on Code generation and 
optimization: feedback-directed and runtime optimization 

Full text available: W\ pdf(988.25 KB) 



Additional Information: full citation, abstract, references 



Publisher Site 



Transmeta's Crusoe microprocessor is a full, system-level implementation of the x86 
architecture, comprising a native VLIW microprocessor with a software layer, the Code 
Morphing Software (CMS), that combines an interpreter, dynamic binary translator, 
optimizer, and runtime system. In its general structure, CMS resembles other binary 
translation systems described in the literature, but it is unique in several respects. The wide 
range of PC workloads that CMS must handle gracefully in real ... 

Keywords: binary translation, dynamic optimization, dynamic translation, emulation, self- 
modifying code, speculation 



^ Machine-adaptable dynamic binary translation 
David Ung, Cristina Cifuentes 

January 2000 ACM SIGPLAN Notices , Proceedings of the ACM SIGPLAN workshop on 
Dynamic and adaptive compilation and optimization, volume 35 issue 7 
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Full text available: pdf(1.23 MB) Additional Information: full citation , abstract , references , citings, index 

terms 

Dynamic binary translation is the process of translating and optinnizing executable code for 
one nnachine to another at runtinne, while the program is "executing" on the target machine. 

Dynamic translation techniques have normally been limited to two particular machines; a 
competitor's machine and the hardware manufacturer's machine. This research provides for 
a more general framework for dynamic translations, by providing a framework based on 
specifications of machines that ... 

Keywords: binary translation, dynamic compilation, dynamic execution, emulation, 
interpretation 



* Hardware Support for Control Transfers in Code Caches 
Ho-Seop Kim, James E. Smith 

December 2003 Proceedings of the 36th Annual IEEE/ ACM International Symposium on 
Microarchitecture 

Full text available: " g) pdf(315.74 KB) 

jfeT Additional Information: full citation , abstract 

Publisher Site 

Many dynamic optimization and/or binary translationsystems hold optimized/translated 
superblocks in a codecache. Conventional code caching systems suffer fromoverheads when 
control is transferred from one cachedsuperblock to another, especially via register- 
indirectjumps. The basic problem is that instruction addresses inthe code cache are different 
from those in the original programbinary. Therefore, performance for register-indirectjumps 
depends on the ability to translate efficiently fromsour ... 

5 A brief history of just-in-time 
John Aycock 

June 2003 ACM Computing Surveys (CSUR), volume 35 issue 2 

Full text available: pdf(171.09 KB) Additional Information: full citation , abstract , references , index terms 

Software systems have been using "just-in-time" compilation (JIT) techniques since the 
1960s. Broadly, JIT compilation includes any translation performed dynamically, after a 
program has started execution. We examine the motivation behind JIT compilation and 
constraints imposed on JIT compilation systems, and present a classification scheme for 
such systems. This classification emerges as we survey forty years of JIT work, from 1960— 
2000. 

Keywords: Just-in-time compilation, dynamic compilation 



® Optimization and precise exceptions in dynamic compilation 
Michael Gschwind, Erik Altman 

March 2001 ACM SIGARCH Computer Architecture News, volume 29 issue i 
Full text available: "p!) pdf(508.52 KB) Additional Information: full citation , abstract , index terms 

Maintaining precise exceptions is an important aspect of achieving full compatibility with a 
legacy architecture. While asynchronous exceptions can be deferred to an appropriate 
boundary in the code, synchronous exceptions must be taken when they occur. This 
introduces uncertainty into liveness analysis since processor state that is otherwise dead 
may be exposed when an exception handler is invoked. Previous systems either had to 
sacrifice full compatibility to achieve more freedom to perform op ... 
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^ Dynamo: a transparent dynamic optimization system 
Vasanth Bala, Evelyn Duesterwald, Sanjeev Banerjia 

May 2000 ACM SIGPLAN Notices , Proceedings of the ACM SIGPLAN 2000 conference 

on Programming language design and implementation, Volume 35 issue 5 
Full text available- ^ pdf(156 03 KB) Additional Information; full citation , abstract , references , citings , index 
'^^^^"^ '' terms 

We describe the design and implementation of Dynamo, a software dynamic optimization 
system that is capable of transparently improving the performance of a native instruction 
stream as it executes on the processor. The Input native instruction stream to Dynamo can 
be dynamically generated (by a JIT for example), or it can come from the execution of a 
statically compiled native binary. This paper evaluates the Dynamo system in the latter, 
more challenging situation, in order to emphasize the ... 

^ An out-of-order execution technique for runtime binary translators 
Bich C, Le 

October 1998 Proceedings of the eighth international conference on Architectural 

support for programming languages and operating systems, volume 32 , 33 

Issue 5,11 

Full text available: "P l pdf(1.Q4 MB) Additional Information: full citation , abstract , references , citings, index 

terms 

A dynamic translator emulates an instruction set architecture by translating source 
instructions to native code during execution. On statically-scheduled hardware, higher 
performance can potentially be achieved by reordering the translated instructions; however, 
this is a challenging transformation if the source architecture supports precise exception 
semantics, and the user-level program is allowed to register exception handlers. This paper 
presents a software technique which allows a translato ... 

^ Optimizations and oracle parallelism with d ynamic translation 
Kemal Ebcioglu, Erik R. Altman, Michael Gschwind, Sumedh Sathaye 

November 1999 Proceedings of the 32nd annual ACM/IEEE international symposium on 
M i croa rch i tect ure 

Full text available: n^ p^jf^^ 28 MB) 0 Additional Information: full citation , abstract , references , citings , index 
Publisher Site te™^ 

We describe several optimizations which can be employed in a dynamic binary translation 
(DBT) system, where low compilation/translation overhead is essential. These optimizations 
achieve a high degree of ILP, sometimes even surpassing a static compiler employing more 
sophisticated, and more time-consuming algorithms [9]. We present results in which we 
employ these optimizations in a dynamic binary translation system capable of computing 
oracle parallelism. 

10 Profile-based optimizations: Dynamic trace selection using performance monitoring 
hardware sampling 

Howard Chen, Wei-Chung Hsu, Jiwei Lu, Pen-Chung Yew, Dong-Yuan Chen 

March 2003 Proceedings of the international symposium on Code generation and 

optimization: feedback-directed and runtime optimization 

Full text available: ^ & 

^ pdf(1.88 MB) w Additional Information: full citation , abstract , references 

Publisher Site 

Optimizing programs at run-time provides opportunities to apply aggressive optimizations to 
programs based on information that was not available at compile time. At run time, 
programs can be adapted to better exploit architectural features, optimize the use of 
dynamic libraries, and simplify code based on run-time constants. Our profiling system 
provides a framework for collecting information required for performing run-time 
optimization. We sample the performance hardware registers available on ... 
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Binary translation and architecture convergence issues for IBM system/390 

Michael Gschwind, Kemal Ebcioglu, Erik Altman, Sumedh Sathaye 

May 2000 Proceedings of the 14th international conference on Supercomputing 

Full text available; pdf(1.44 MB) Additional Information; full citation , abstract , references , index terms 

We describe the design issues in an implementation of the ESA/39G architecture based on 
binary translation to a very long instruction word (VLIW) processor. During binary 
translation, complex ESA/390 instructions are decomposed into instruction "primitives" 
which are then scheduled onto a wide-issue machine. The aim is to achieve high instruction 
level parallelism due to the increased scheduling and optimization opportunities which can 
be exploited by binary translation software ... 

"^2 A study of exception handling and its dynamic optimization in Java 
Takeshi Ogasawara, Hideaki Komatsu, Toshio Nakatani 

October 2001 ACM SIGPLAN Notices , Proceedings of the 16th ACM SIGPLAN 

conference on Object oriented programming, systems, languages, and 

applications, volume 36 Issue 11 

Full text available: fia pdff190.18 KB) Additional Information: full citation , abstract, references , citings, index 
'^^^"^ '' terms 

Optimizing exception handling is critical for programs that frequently throw exceptions. We 
observed that there are many such exception-intensive programs iin various categories of 
Java programs. There are two commonly used exception handling techniques, stack 
unwinding optimizes the normal path, while stack cutting optimizes the exception handling 
path. However, there has been no single exception handling technique to optimize both 
paths. 

Dynamic Adaptive compilation: An infrastructure for adaptive dynamic optimization 
Derek Bruening, Timothy Garnett, Saman Amarasinghe 

March 2003 Proceedings of the international symposium on Code generation and 

optimization: feedback-directed and runtime optimization 

Full text available: ^ ^ 

T7>1 pqT(T To Mb) w Additional Information: full citation , abstract , references , citings 

Publisher Site 

Dynamic optimization is emerging as a promising approach to overcome many of the 
obstacles of traditional static compilation. But while there are a number of compiler 
infrastructures for developing static optimizations, there are very few for developing 
dynamic optimizations. We present a framework for implementing dynamic analyses and 
optimizations. We provide an interface for building external modules, or clients, for the 
DynamoRlO dynamic code modification system. This interface abstracts awa ... 

Practicing JUDO: Java under dynamic optimizations 
Michal Cierniak, Guei-Yuan Lueh, James M. Stichnoth 

May 2000 ACM SIGPLAN Notices , Proceedings of the ACM SIGPLAN 2000 conference 
on Programming language design and implementation, volume 35 issue 5 

Additional Information: full citation , abstract , references , citings , index 



Full text available: ' Mpdf(190.06 KB) 

terms 

A high-performance implementation of a Java Virtual Machine (JVM) consists of efficient 
implementation of Just-In-Time (JIT) compilation, exception handling, synchronization 
mechanism, and garbage collection (GC). These components are tightly coupled to achieve 
high performance. In this paper, we present some static anddynamic techniques 
implemented in the JIT compilation and exception handling of the Microprocessor Research 
Lab Virtual Machine (MRL VM), ... 
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15 Novel ideas: Performance characterization of a hardware mechanism for dynamic Q 
optimization 

Brian Fahs, Satarupa Bose, Matthew Crum, Brian Slechta, Francesco Spadini, Tony Tung, 
Sanjay J. Patel, Steven S. Lunnetta 

December 2001 Proceedings of the 34th annual ACM/IEEE international symposium on 
Microarchitecture 

Full text available:^ 

paT(1.31 MB) tlP* Additional Information: full citation , abstract , references , citings 
Publisher Site 

We evaluate the rePLay microarchitecture as a means for reducing application execution 
time by facilitating dynamic optimization. The framework contains a programmable 
optimization engine coupled with a hardware-based recovery mechanism. The optimization 
engine enables the dynamic optimizer to run concurrently with program execution. The 
recovery mechanism enables the optimizer to make speculative optimizations without 
requiring recovery code. We demonstrate that a rePLay configuration performing ... 

Sifting out the mud: low level C++ code reuse Q 
Bjorn De Sutter, Bruno De Bus, Koen De Bosschere 

November 2002 ACM SIGPLAN Notices , Proceedings of the 17th ACM SIGPLAN 

conference on Object-oriented programming, systems, languages, and 

applications, volume 37 Issue 11 

Additional Information: full citation , abstract , references , citings , index 



Full text available: TO pdfd.SSMB) 

terms 

More and more computers are being incorporated in devices where the available amount of 
memory is limited. This contrasts with the increasing need for additional functionality and 
the need for rapid application development. While object-oriented programming languages, 
providing mechanisms such as inheritance and templates, allow fast development of 
complex applications, they have a detrimental effect on program size. This paper introduces 
new techniques to reuse the code of whole procedures at t ... 

Keywords: code compaction, code size reduction 



A region-based compilation technique for a Java just-in-time compiler 
Toshio Suganuma, Toshiaki Yasue, Toshio Nakatani 

May 2003 ACM SIGPLAN Notices , Proceedings of the ACM SIGPLAN 2003 conference 

on Programming language design and implementation, volume 38 issue s 
Full text available: ■ p|pdf(158,62 KB) Additional Information: full citation, abstract, references , citinas, index 
^ terms 

{Method inlining and data flow analysis are two major optimization components for effective 
program transformations, however they often suffer from the existence of rarely or never 
executed code contained in the target method. One major problem lies in the assumption 
that the compilation unit is partitioned at method boundaries. This paper describes the 
design and implementation of a region-based compilation technique in our dynamic 
compilation system, in which the compiled regions are selected a ... 

Keywords: dynamic compilers, on-stack replacement, partial inlining, region-based 
compilation 



18 Compilation and run-time systems: Vacuum packing: extracting hardware-detected 
program phases for post-link optimization 

Ronald D. Barnes, Erik M. Nystrom, l^atthew C, Merten, Wen-mei W. Hwu 
November 2002 Proceedings of the 35th annual ACM/IEEE international symposium on 
Microarchitecture 
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^ paT(1.26 MB) tO^* Additional Information: full citation , abstract , references , index terms 
Publisher Site 

This paper presents Vacuum Packing, a new approach to profile-based progrann 
optimization. Instead of using traditional aggregate or summarized execution profile 
weights, this approach uses a transparent hardware profiler to automatically detect 
execution phases and record branch profile information for each new phase. The code 
extraction algorithm then produces code packages that are specially formed for their 
corresponding phases. The algorithm compensates for the incomplete and often 
incoheren ... 



Optimising hot paths in a dynamic binary translator Q 
David Ung, Cristina Cifuentes 

March 2001 ACM SIGARCH Computer Architecture News, volume 29 issue 1 
Full text available: pdf(890.10 KB) Additional Information: full citation , abstract , index terms 

In dynamic binary translation, code is translated "on the fly" at run-tinne, while the user 
perceives ordinary execution of the progrann on the target machine. Code fragments that 
are frequently executed follow the same sequence of flow control over a period of time. 
These fragments form a hot path and are optimised to improve the overall performance of 
the program. Multiple hot paths may also exist in programs. A program may choose to 
execute in one hot path for some time, but later switch to anot ... 

Keywords: binary translation, dynamic compilation, dynamic execution, run-time profiling 



Compilation and run-time systems: DELI: a new run-time control point 
Giuseppe Desoli, Nikolay Mateev, Evelyn Duesterwald, Paolo Faraboschi, Joseph A. Fisher 
November 2002 Proceedings of the 35th annual ACM/IEEE international symposium on 
Microarchitecture 

Full text available: i gp^f^-^ 27 MB) 0 Additional Information: full citation , abstract , references , citings, index 
Publisher Site t^ms 

The Dynamic Execution Layer Interface (DELI) offers the following unique capability: it 
provides fine-grain control over the execution of programs, by allowing its clients to observe 
and optionally manipulate every single instruction— at run time— just before it runs. DELI 
accomplishes this by opening up an interface to the layer between the execution of software 
and hardware. To avoid the slowdown, DELI caches a private copy of the executed code and 
always runs out of its own private cache. In ... 
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